Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.
File: internals.info, Node: Modules for Internationalization, Prev: Modules for Interfacing with X Windows, Up: A Summary of the Various XEmacs Modules
Modules for Internationalization
================================
size name
------- ---------------------
42836 mule-canna.c
16737 mule-ccl.c
41080 mule-charset.c
30176 mule-charset.h
146844 mule-coding.c
16588 mule-coding.h
6996 mule-mcpath.c
2899 mule-mcpath.h
57158 mule-wnnfns.c
3351 mule.c
These files implement the MULE (Asian-language) support. Note that
MULE actually provides a general interface for all sorts of languages,
not just Asian languages (although they are generally the most
complicated to support). This code is still in beta.
`mule-charset.*' and `mule-coding.*' provide the heart of the XEmacs
MULE support. `mule-charset.*' implements the "charset" Lisp object
type, which encapsulates a character set (an ordered one- or
two-dimensional set of characters, such as US ASCII or JISX0208 Japanese
Kanji).
`mule-coding.*' implements the "coding-system" Lisp object type,
which encapsulates a method of converting between different encodings.
An encoding is a representation of a stream of characters, possibly
from multiple character sets, using a stream of bytes or words, and
defines (e.g.) which escape sequences are used to specify particular
character sets, how the indices for a character are converted into bytes
(sometimes this involves setting the high bit; sometimes complicated
rearranging of the values takes place, as in the Shift-JIS encoding),
etc.
`mule-ccl.c' provides the CCL (Code Conversion Language)
interpreter. CCL is similar in spirit to Lisp byte code and is used to
implement converters for custom encodings.
`mule-canna.c' and `mule-wnnfns.c' implement interfaces to external
programs used to implement the Canna and WNN input methods,
respectively. This is currently in beta.
`mule-mcpath.c' provides some functions to allow for pathnames
containing extended characters. This code is fragmentary, obsolete, and
completely non-working. Instead, PATHNAME-CODING-SYSTEM is used to
specify conversions of names of files and directories. The standard C
I/O functions like `open()' are wrapped so that conversion occurs
automatically.
`mule.c' provides a few miscellaneous things that should probably be
elsewhere.
9400 intl.c
This provides some miscellaneous internationalization code for
implementing message translation and interfacing to the Ximp input
method. None of this code is currently working.
1764 iso-wide.h
This contains leftover code from an earlier implementation of
Asian-language support, and is not currently used.
File: internals.info, Node: Allocation of Objects in XEmacs Lisp, Next: Events and the Event Loop, Prev: A Summary of the Various XEmacs Modules, Up: Top
Allocation of Objects in XEmacs Lisp
************************************
* Menu:
* Introduction to Allocation::
* Garbage Collection::
* GCPROing::
* Integers and Characters::
* Allocation from Frob Blocks::
* lrecords::
* Low-level allocation::
* Pure Space::
* Cons::
* Vector::
* Bit Vector::
* Symbol::
* Marker::
* String::
* Bytecode::
File: internals.info, Node: Introduction to Allocation, Next: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp
Introduction to Allocation
==========================
Emacs Lisp, like all Lisps, has garbage collection. This means that
the programmer never has to explicitly free (destroy) an object; it
happens automatically when the object becomes inaccessible. Most
experts agree that garbage collection is a necessity in a modern,
high-level language. Its omission from C stems from the fact that C was
originally designed to be a nice abstract layer on top of assembly
language, for writing kernels and basic system utilities rather than
large applications.
Lisp objects can be created by any of a number of Lisp primitives.
Most object types have one or a small number of basic primitives for
creating objects. For conses, the basic primitive is `cons'; for
vectors, the primitives are `make-vector' and `vector'; for symbols,
the primitives are `make-symbol' and `intern'; etc. Some Lisp objects,
especially those that are primarily used internally, have no
corresponding Lisp primitives. Every Lisp object, though, has at least
one C primitive for creating it.
Recall from section (VII) that a Lisp object, as stored in a 32-bit
or 64-bit word, has a mark bit, a few tag bits, and a "value" that
occupies the remainder of the bits. We can separate the different Lisp
object types into four broad categories:
* (a) Those for whom the value directly represents the contents of
the Lisp object. Only two types are in this category: integers and
characters. No special allocation or garbage collection is
necessary for such objects. Lisp objects of these types do not
need to be `GCPRO'ed.
In the remaining three categories, the value is a pointer to a
structure.
* (b) Those for whom the tag directly specifies the type. Recall
that there are only three tag bits; this means that at most five
types can be specified this way. The most commonly-used types are
stored in this format; this includes conses, strings, vectors, and
sometimes symbols. With the exception of vectors, objects in this
category are allocated in "frob blocks", i.e. large blocks of
memory that are subdivided into individual objects. This saves a
lot on malloc overhead, since there are typically quite a lot of
these objects around, and the objects are small. (A cons, for
example, occupies 8 bytes on 32-bit machines - 4 bytes for each of
the two objects it contains.) Vectors are individually
`malloc()'ed since they are of variable size. (It would be
possible, and desirable, to allocate vectors of certain small
sizes out of frob blocks, but it isn't currently done.) Strings
are handled specially: Each string is allocated in two parts, a
fixed size structure containing a length and a data pointer, and
the actual data of the string. The former structure is allocated
in frob blocks as usual, and the latter data is stored in "string
chars blocks" and is relocated during garbage collection to
eliminate holes.
In the remaining two categories, the type is stored in the object
itself. The tag for all such objects is the generic "lrecord"
(Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines)
of the object's structure are a pointer to a structure that describes
the object's type, which includes method pointers and a pointer to a
string naming the type. Note that it's possible to save some space by
using a one- or two-byte tag, rather than a four- or eight-byte pointer
to store the type, but it's not clear it's worth making the change.
* (c) Those lrecords that are allocated in frob blocks (see above).
This includes the objects that are most common and relatively
small, and includes floats, bytecodes, symbols (when not in
category (b)), extents, events, and markers. With the cleanup of
frob blocks done in 19.12, it's not terribly hard to add more
objects to this category, but it's a bit trickier than adding an
object type to type (d) (esp. if the object needs a finalization
method), and is not likely to save much space unless the object is
small and there are many of them. (In fact, if there are very few
of them, it might actually waste space.)
* (d) Those lrecords that are individually `malloc()'ed. These are
called "lcrecords". All other types are in this category. Adding
a new type to this category is comparatively easy, and all types
added since 19.8 (when the current allocation scheme was devised,
by Richard Mlynarik), with the exception of the character type,
have been in this category.
Note that bit vectors are a bit of a special case. They are simple
lrecords as in category (c), but are individually `malloc()'ed like
vectors. You can basically view them as exactly like vectors except
that their type is stored in lrecord fashion rather than in
directly-tagged fashion.
Note that FSF Emacs redesigned their object system in 19.29 to follow
a similar scheme. However, given RMS's expressed dislike for data
abstraction, the FSF scheme is not nearly as clean or as easy to
extend. (FSF calls items of type (c) `Lisp_Misc' and items of type (d)
`Lisp_Vectorlike', with separate tags for each, although
`Lisp_Vectorlike' is also used for vectors.)
File: internals.info, Node: Garbage Collection, Next: GCPROing, Prev: Introduction to Allocation, Up: Allocation of Objects in XEmacs Lisp
Garbage Collection
==================
Garbage collection is simple in theory but tricky to implement.
Emacs Lisp uses the oldest garbage collection method, called "mark and
sweep". Garbage collection begins by starting with all accessible
locations (i.e. all variables and other slots where Lisp objects might
occur) and recursively traversing all objects accessible from those
slots, marking each one that is found. We then go through all of
memory and free each object that is not marked, and unmarking each
object that is marked. Note that "all of memory" means all currently
allocated objects. Traversing all these objects means traversing all
frob blocks, all vectors (which are chained in one big list), and all
lcrecords (which are likewise chained).
Note that, when an object is marked, the mark has to occur inside of
the object's structure, rather than in the 32-bit `Lisp_Object' holding
the object's pointer; i.e. you can't just set the pointer's mark bit.
This is because there may be many pointers to the same object. This
means that the method of marking an object can differ depending on the
type. The different marking methods are approximately as follows:
1. For conses, the mark bit of the car is set.
2. For strings, the mark bit of the string's plist is set.
3. For symbols when not lrecords, the mark bit of the symbol's plist
is set.
4. For vectors, the length is negated after adding 1.
5. For lrecords, the pointer to the structure describing the type is
changed (see below).
6. Integers and characters do not need to be marked, since no
allocation occurs for them.
The details of this are in the `mark_object()' function.
Note that any code that operates during garbage collection has to be
especially careful because of the fact that some objects may be marked
and as such may not look like they normally do. In particular:
Some object pointers may have their mark bit set. This will make
`FOOBARP()' predicates fail. Use `GC_FOOBARP()' to deal with this.
* Even if you clear the mark bit, `FOOBARP()' will still fail for
lrecords because the implementation pointer has been changed (see
below). `GC_FOOBARP()' will correctly deal with this.
* Vectors have their size field munged, so anything that looks at
this field will fail.
* Note that `XFOOBAR()' macros *will* work correctly on object
pointers with their mark bit set, because the logical shift
operations that remove the tag also remove the mark bit.
Finally, note that garbage collection can be invoked explicitly by
calling `garbage-collect' but is also called automatically by `eval',
once a certain amount of memory has been allocated since the last
garbage collection (according to `gc-cons-threshold').
File: internals.info, Node: GCPROing, Next: Integers and Characters, Prev: Garbage Collection, Up: Allocation of Objects in XEmacs Lisp
`GCPRO'ing
==========
`GCPRO'ing is one of the ugliest and trickiest parts of Emacs
internals. The basic idea is that whenever garbage collection occurs,
all in-use objects must be reachable somehow or other from one of the
roots of accessibility. The roots of accessibility are:
1. All objects that have been `staticpro()'d. This is used for any
global C variables that hold Lisp objects. A call to
`staticpro()' happens implicitly as a result of any symbols
declared with `defsymbol()' and any variables declared with
`DEFVAR_FOO()'. You need to explicitly call `staticpro()' (in the
`vars_of_foo()' method of a module) for other global C variables
holding Lisp objects. (This typically includes internal lists and
such things.)
Note that `obarray' is one of the `staticpro()'d things.
Therefore, all functions and variables get marked through this.
2. Any shadowed bindings that are sitting on the specpdl stack.
3. Any objects sitting in currently active (Lisp) stack frames,
catches, and condition cases.
4. A couple of special-case places where active objects are located.
5. Anything currently marked with `GCPRO'.
Marking with `GCPRO' is necessary because some C functions (quite a
lot, in fact), allocate objects during their operation. Quite
frequently, there will be no other pointer to the object while the
function is running, and if a garbage collection occurs and the object
needs to be referenced again, bad things will happen. The solution is
to mark those objects with `GCPRO'. Unfortunately this is easy to
forget, and there is basically no way around this problem. Here are
some rules, though:
1. For every `GCPRON', there have to be declarations of `struct gcpro
gcpro1, gcpro2', etc.
2. You *must* `UNGCPRO' anything that's `GCPRO'ed, and you *must not*
`UNGCPRO' if you haven't `GCPRO'ed. Getting either of these wrong
will lead to crashes, often in completely random places unrelated
to where the problem lies.
3. The way this actually works is that all currently active `GCPRO's
are chained through the `struct gcpro' local variables, with the
variable `gcprolist' pointing to the head of the list and the nth
local `gcpro' variable pointing to the first `gcpro' variable in
the next enclosing stack frame. Each `GCPRO'ed thing is an
lvalue, and the `struct gcpro' local variable contains a pointer to
this lvalue. This is why things will mess up badly if you don't
pair up the `GCPRO's and `UNGCPRO's - you will end up with
`gcprolist's containing pointers to `struct gcpro's or local
`Lisp_Object' variables in no-longer-active stack frames.
4. It is actually possible for a single `struct gcpro' to protect a
contiguous array of any number of values, rather than just a
single lvalue. To effect this, call `GCPRON' as usual on the
first object in the array and then set `gcpron.nvars'.
5. *Strings are relocated.* What this means in practice is that the
pointer obtained using `XSTRING_DATA()' is liable to change at any
time, and you should never keep it around past any function call,
or pass it as an argument to any function that might cause a
garbage collection. This is why a number of functions accept
either a "non-relocatable" `char *' pointer or a relocatable Lisp
string, and only access the Lisp string's data at the very last
minute. In some cases, you may end up having to `alloca()' some
space and copy the string's data into it.
6. By convention, if you have to nest `GCPRO''s, use `NGCPRON' (along
with `struct gcpro ngcpro1, ngcpro2', etc.), `NNGCPRON', etc.
This avoids compiler warnings about shadowed locals.
7. It is *always* better to err on the side of extra `GCPRO's rather
than too few. The extra cycles spent on this are almost never
going to make a whit of difference in the speed of anything.
8. The general rule to follow is that caller, not callee, `GCPRO's.
That is, you should not have to explicitly `GCPRO' any Lisp objects
that are passed in as parameters, but if you create any Lisp
objects (remember, this happens in all sorts of circumstances,
e.g. with `Fcons()', etc.), you are responsible for `GCPRO'ing the
objects unless you are *absolutely sure* that there's no
possibility that a garbage-collection can occur while you need to
use the object. Even then, consider `GCPRO'ing.
9. A garbage collection can occur whenever anything calls `Feval', or
whenever a QUIT can occur where execution can continue past this.
(Remember, this is almost anywhere.)
10. If you have the *least smidgeon of doubt* about whether you need
to `GCPRO', you should `GCPRO'.
11. Beware of `GCPRO'ing something that is uninitialized. If you have
any shade of doubt about this, initialize all your variables to
`Qnil'.
12. Be careful of traps, like calling `Fcons()' in the argument to
another function. By the "caller protects" law, you should be
`GCPRO'ing the newly-created cons, but you aren't. A certain
number of functions that are commonly called on freshly created
stuff (e.g. `nconc2()', `Fsignal()'), break the "caller protects"
law and go ahead and `GCPRO' their arguments so as to simplify
things, but make sure and check if it's OK whenever doing
something like this.
13. Once again, remember to `GCPRO'! Bugs resulting from insufficient
`GCPRO'ing are intermittent and extremely difficult to track down,
often showing up in crashes inside of `garbage-collect' or in
weirdly corrupted objects or even in incorrect values in a totally
different section of code.
Given the extremely error-prone nature of the `GCPRO' scheme, and
the difficulties in tracking down, it should be considered a deficiency
in the XEmacs code. A solution to this problem would involve
implementing so-called "conservative" garbage collection for the C
stack. That involves looking through all of stack memory and treating
anything that looks like a reference to an object as a reference. This
will result in a few objects not getting collected when they should, but
it obviates the need for `GCPRO'ing, and allows garbage collection to
happen at any point at all, such as during object allocation.
File: internals.info, Node: Integers and Characters, Next: Allocation from Frob Blocks, Prev: GCPROing, Up: Allocation of Objects in XEmacs Lisp
Integers and Characters
=======================
Integer and character Lisp objects are created from integers using
the macros `XSETINT()' and `XSETCHAR()' or the equivalent functions
`make_int()' and `make_char()'. (These are actually macros on most
systems.) These functions basically just do some moving of bits
around, since the integral value of the object is stored directly in
the `Lisp_Object'.
`XSETINT()' and the like will truncate values given to them that are
too big; i.e. you won't get the value you expected but the tag bits
will at least be correct.
File: internals.info, Node: Allocation from Frob Blocks, Next: lrecords, Prev: Integers and Characters, Up: Allocation of Objects in XEmacs Lisp
Allocation from Frob Blocks
===========================
The uninitialized memory required by a `Lisp_Object' of a particular
type is allocated using `ALLOCATE_FIXED_TYPE()'. This only occurs
inside of the lowest-level object-creating functions in `alloc.c':